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ABSTRACT 

In this work we explore cyberbullying and other toxic behav- 
ior in team competition online games. Using a dataset of over 
10 million player reports on 1.46 million toxic players along 
with corresponding crowdsourced decisions, we test several 
hypotheses drawn from theories explaining toxic behavior. 
Besides providing large-scale, empirical based understanding 
of toxic behavior, our work can be used as a basis for building 
systems to detect, prevent, and counter-act toxic behavior. 
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EXECUTIVE SUMMARY 

The increasing prevalence of computer mediated communica- 
tion (CMC) has brought with it a host of undesirable behav- 
ior. In particular, cyberbullying and other toxic (i.e., “bad’’) 
behavior has become exceedingly problematic. One of the 
most popular forms of CMC are online video games, which 
enable real time interaction between hundreds of millions of 
players across the world. The unique elements of competitive 
online games makes players particularly vulnerable to the ex- 
hibition of, and negative effects from, cyberbullying and toxic 
behavior. 


In this paper, we analyze a few million reports on toxic be- 
havior from the most popular online game in the world, the 
League of Legends (LoL). We find that: 


1. Players are surprisingly not engaged in actively reporting 
toxic behavior. 


2. Engagement can be significantly increased via explicit 
pleas to report. 


3. There are significantly varying perceptions of what consti- 
tutes toxic behavior between those that experienced it and 
neutral 3rd party “judges.” 


4. There are biases with respect to reporting allies vs. ene- 


mies. 


5. There are significant cultural differences in perceptions 
concerning toxic behavior. 


6. The result of a match is significantly linked to the appear- 


ance of toxic behavior. 


Our findings suggest several avenues for designers of video 
games in particular, and CMC systems in general. 


INTRODUCTION 

With the remarkable advances from isolated console 
games to massively multi-player online role-playing games 
(MMORPG), the online gaming world provides yet another 
place where people interact with each other. The main rea- 
sons that researchers pay attention to online games are 1) that 
the purpose of actions is relatively clear, and 2) that actions 
are quantifiable. A wide range of predefined actions for sup- 
porting social interaction (e.g., friendship, communication, 
trade, enmity, aggression, and punishment) reflects either 
positive or negative connotations among game players [47], 
and they are unobtrusively recorded by game servers. These 
rich electronic footprints enable and encourage research of 
social dynamics [22, 28, 47]. 


There is no end to the growth of online gaming in sight. For 
example, in 2011 the First Dota 2 International Tournament 
had a prize pool of $1,600,000. In 2014, the prize pool started 
again at $1.6M, but, a crowdfunding mechanism brought it 
up to just under $11M. For comparison, the winners of the 
2014 International received $5M ($1M for each team mem- 
ber), while the winner of the 2014 US Open Men’s tennis 
tournament received $3M. In other words, video games have 
reached a level of sophistication and competitiveness that not 
only can gamers make a living playing them, they can make 
quite a comfortable living. 


While the widely adopted game design element of competi- 
tion increases enjoyment [52], it has also led to an increas- 
ing concern about negative behavior. In online gaming, neg- 
ative behavior, such as cyberbullying [46], griefing [26], mis- 
chief [31], and cheating [10] are often grouped together and 
called toxic behavior. Unfortunately, the definition of toxic 
behavior is often unclear [16] due to differences in expected 


behavior, customs, rules, and ethics across games [53]. Such 
ambiguity and subjective perception of griefing make griefers 
themselves sometimes fail to recognize what they did [34]. 


To make matters worse, since online games are very popu- 
lar in the younger generation [5], instances of cyberbullying 
can cause far reaching problems. In general, cyberbullying is 
associated with depression, anxiety, and has been shown to 
result in drastic actions such as suicide in several well publi- 
cized cases [6]. With the amount of time and energy players 
invest into games, victims of toxic behavior are likely to feel 
emotional effects that persist to the real-world. 


In this paper, we make use of a large-scale dataset capturing 
millions of instances of toxic behavior perpetuated by hun- 
dreds of thousands of accused toxic players from the League 
of Legends (LoL)!, the world’s most popular online game [4]. 
The richly detailed cases are augmented by crowdsourced 
decisions on whether or not the accused was in fact toxic. 
Drawing from sociology and psychology literature, we ex- 
plore toxic behavior through the lens of competitive online 
games. 


BACKGROUND 

In this section we begin with a basic description of LoL and 
the Tribunal, its crowdsourcing platform for addressing toxic 
behavior. We highlight the differences between LoL and 
other online games from the perspective of team formation, 
the goal of the game, and the common game modes. We then 
move on to how Riot Games attempts to detect toxic players 
and which behavior is considered toxic. Lastly, we explain 
how the Tribunal system works. 


League of Legends as a Team Competition Game 

The League of Legends (LoL), a Multiplayer Online Battle 
Arena (MOBA), is arguably the most popular online game in 
the world. The developer, Riot Games, recently announced 
there are 67 million players per month, 27 million players per 
day, and over 7.5 million concurrent players at peak times [2]. 
LoL is an online team competition game whose goal is to pen- 
etrate and destroy the enemy’s central base, called the Nexus. 
In contrast to massively multiplayer online role-playing game 
(e.g., World of Warcraft), LoL is a match-based team compe- 
tition game, where a single match usually takes around 30 to 
40 minutes. Although LoL provides a few different modes of 
matches, all the modes are competitions between two teams. 
In the most popular game mode, each team has five members. 
A player is given the option to form a team with friends be- 
fore the match. Otherwise, the player is randomly assigned to 
a team with strangers who have similar skill levels. 


Player Reports on Toxic Behavior 

After a match, players see a scoreboard as in Figure 1. In 
Figure 1, (A) is the match summary, (B) lists players who 
played the game together, and (C) is a chat window. Users can 
report toxic players by clicking the rightmost red button in 
(B). Each player can submit one report for every other player 
per game. We stress that player reports can only be submitted 
after the match is completed, and at a later date reviewers vote 
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as described in the next section. Therefore, player reports are 
not considered a strategic element during the match itself. 





Figure 1. LoL scoreboard after a match. 


When reporting, players choose from several predefined 
categories of toxic playing: assisting enemy team, inten- 
tional feeding [suicide], offensive language, verbal abuse, 
negative attitude, inappropriate name, spamming, unskilled 
player, refusing to communicate with team, and leaving the 
game/AFK [away from keyboard]. Detailed explanations of 
each type of toxic playing are given in [3]. Each report is 
submitted under one category. We further classify these cate- 
gories as either cyberbullying (offensive language and verbal 
abuse) or domain specific toxicity (the remaining categories). 


While most of categories are self explanatory, two of the do- 
main specific categories need additional explanation: inten- 
tional feeding and leaving the game. Intentional feeding in- 
dicates that the player died to the opposing team on purpose, 
often repeatedly. Leaving the game/AFK refers to instances 
when a user is inactive through the entire match. These two 
categories of toxic behavior usually make the opposing team 
stronger due to the game’s design. 


To understand the motivation behind intentional feeding and 
leaving the game, we present one common scenario where 
such toxic play happens. A LoL match is concluded when 
the Nexus of one team is destroyed. If one team is obviously 
at a disadvantage during the game, the team may give up the 
match by voting to “surrender”. However, surrender is ac- 
cepted only when at least four out of five players on the same 
team agree to do so. When the surrender vote fails, players 
who voted for surrender may lose interest in continuing the 
game. Then, they might exhibit extreme behavior, such as in- 
tentional feeding or leaving the game/AFK, in an attempt to 
force the game to finish earlier or convince other players to 
cast a vote for surrender. 


LoL Tribunal as a Crowdsourcing System 

The LoL Tribunal is a crowdsourcing system to make de- 
cisions on whether reported players should be punished or 
not. Only players that are reported more than a few hundred 
times are brought to the Tribunal [11]. Once this threshold 
is reached, up to 5 randomly selected matches that an ac- 
cused player was reported in are aggregated into a case. Re- 
viewers, who are LoL players with enough time invested to 


be considered “experts,” are then presented with a rich set 
of information about the case. They see the full chat logs 
from each match, the performance (similar to the end score 
board) of each player in the game, the duration of the match, 
and the category (and optional comments) that the player was 
reported for. To ensure an unbiased result, all players are 
anonymized in the match data. We note that among 10 prede- 
fined categories of toxic behavior, the Tribunal excludes re- 
ports of unskilled player, refusing to communicate with team, 
and leaving the game. 


It is known that about 100-150 reviewers cast votes for a sin- 
gle case [11], and a majority voting scheme is used to reach a 
verdict. Guilty verdicts can result in a ban for 1 week, a few 
months, and sometimes longer suspensions/permanent bans. 
E.g., professional players have received lengthy, career im- 
pacting suspensions due to Tribunal decisions [1]. 


DATA COLLECTION 

We collect all information available in the Tribunal: player 
reports, the crowdsourced decision, and detailed match logs. 
This rich collection is the basis for our quantitative analysis. 


Early studies on moral behavior in sports largely depend on 
self-reports [41], but they suffer in quality and scalability 
from bias in the human recall process [35]. Furthermore, 
self-reports in particular can be influenced by social desir- 
ability, the tendency to behave in a way that is socially prefer- 
able [29]. In this sense, reports submitted by 3rd parties, as in 
the Tribunal, are more reliable since they eliminate the bias 
of self-report. 


Another widely used method is asking individuals about hy- 
pothetical situations, but it also has the same social desirabil- 
ity bias, and additionally has limitations due to the speci- 
ficity of scenarios and scalability [42]. Our huge collection 
of witness reports on observed toxic behavior helps address 
these problems. Furthermore, crowdsourced decisions offer 
an even more objective viewpoint on reported toxic behavior. 


Riot Games divides the world into several regions and main- 
tains dedicated servers for each region. We focus on three 
regions, North America (NA), Western Europe (EUW), and 
South Korea (KR) by considering representativeness of cul- 
tural uniqueness and familiarity to the authors”. Although a 
player may connect to servers operated in different regions, 
a further distance between a player and a server usually re- 
sults in increased latency in Internet connections, sacrificing 
the quality of responsiveness and interactiveness of online 
games. We thus reasonably assume most players connect to 
the servers corresponding to their real-world region for the 
best quality of service. 


In April 2013, we collected about 11 million player reports 
from 6 million matches on 1.5 million potentially toxic play- 
ers across three regions. We collect all available data from 
the servers and summarize it in Table 1. We first note that 
the KR portion of our dataset is smaller than other regions 
because the KR Tribunal started in November 2012 but the 
EUW and NA Tribunals started in May 2011. Next, since 
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player reports are internally managed, it is not easy to mea- 
sure our dataset’s completeness. However, as reviewers can 
see their past votes and final decisions at a later time, it makes 
sense that no votes or reports are removed from servers and 
we have what amounts to a complete picture of the Tribunal. 





NA EUW KR TOTAL 
Player 590,311 649,419 220,614 1,460,344 
Match 2,107,522 2,841,906 1,066,618 6,016,046 
Report 3,441,557 5,559,968 1,893,433 10,898,958 


Table 1. Summary of our dataset. 


RESEARCH QUESTIONS AND HYPOTHESES 

In this section we formulate our research questions on how a 
toxic player behaves as well as how other players react to the 
toxic player. 


At its core, LoL is a team competition game. Although com- 
petition against human opponents provides a high degree of 
player enjoyment [54], it can also result in toxic behavior. 
The potential for toxic behavior can be further exacerbated 
by the anonymous aspects of LoL, as people do not feel ac- 
countable for their toxic behavior when anonymous and can 
actually increase their aggressive behavior [17]. We present 
several compelling theories from sociology and psychology 
that describe the “whys” of toxic behavior. Our large-scale 
dataset gives the opportunity to apply these theories “in the 
wild.” 


We begin with an explanation of the bystander effect and the 
vague nature of toxic playing. Next, we discuss the concepts 
of in-group favoritism and out-group hostility with respect 
to competition. We then describe how intra-group conflict 
emerges in a team competition environment like LoL and re- 
lated theories which indicate a socio-political effect on how 
toxic behavior is both expressed and perceived. Finally, we 
consider the effects of team-cohesion on performance, which 
can provide insights into what might trigger toxicity in online 
games in particular. 


Bystander Effect and Vague Nature of Toxic Playing 

Our first research question is focused on how actively people 
report toxic playing. It is essential to understand the effective- 
ness and design implications of a system for counter-acting 
toxic playing, such as the Tribunal. 


RQla: How active are players in reporting toxic behavior? 


This research question is heavily related to the social psy- 
chology concept of the bystander effect [9], which describes 
the tendency for observers to avoid helping a victim, particu- 
larly when they are immersed in a group. Considering LoL’s 
anonymous setting with ephemeral teams, the bystander ef- 
fect can influence how other players react to a toxic player. If 
the bystander effect is valid in this setting, then most players 
in a match will not report the toxic player even though they 
directly witnessed the abuse. We explore how actively players 
report their observation of toxic behavior via our dataset. 


Interestingly, the bystander effect is known to be mitigated 
with explicit pleas for aid [33]. We can thus draw a testable 
hypothesis. 


HYPOTHESIS 1.1. If there is a request asking to report 
toxic players, the number of reports is increased. 


RQ1b: How does the vague nature of toxic playing affect 
tribunal decisions? 


Additionally, players have different perceptions on the sever- 
ity of toxic behavior and thus sometimes fail to recognize its 
presence [34]. This vague nature of toxic playing might af- 
fect both reporting and reviewing. We expect that different 
perceptions between players who report toxic behavior and 
Tribunal reviewers will result in a number of pardons. 


In-group Favoritism and Out-group Hostility 
The next question we explore is related to the team-based, 
competitive gameplay of LoL. 


RQ2: What is the difference between reporting behavior of 
the toxic player’s teammates and his opponents? 


In-group favoritism is simply the tendency of people to fa- 
vor in-group members (e.g., teammates) over similarly lik- 
able out-group members (e.g., opponents) while disliking out- 
group members when compared to similarly dislikable in- 
group members [51]. This is also related to homophily [37], 
which is the tendency for similar individuals to form relation- 
ships. 


Deindividuation theory explains how an individual loses the 
concept of both self and responsibility when immersed in 
a crowd [21], which provides a possible mechanism be- 
hind in-group favoritism. This foundation meshes with the 
unique characteristics of computer-mediated communication 
(CMC): anonymity and reduced self-awareness [15, 48]. 


Reicher et al. [43] proposed the Social Identity Model of 
Deindividuation Effects (SIDE) as a way of explaining effects 
that classic deindividuation theory could not. In short, they 
discovered that simply being part of an anonymous crowd 
was insufficient impetus for displaying anti-normative behav- 
ior. Instead, in-group identity alone becomes relevant only 
in comparison to a relevant out-group. In other words, both 
anonymity and context drive deindivduation. 


Even though the groupings in LoL are ephemeral and mostly 
anonymous, the two teams are in direct competition and only 
one team can win: the thrill of victory or the agony of de- 
feat is shared by all members of a team. This configuration 
catalyzes group identification and provides clear cut divisions 
of in- and out-groups in a context that is driven by out-group 
hostility; i.e., players are literally attempting to defeat the op- 
posing team. Thus, we expect to see empirical evidence of 
in-group favoritism in reporting of some toxic behavior that 
equally affects both teams. 


HYPOTHESIS 2.1. For toxic behavior that affects both 
teams equally, in-group members (teammates) are less likely 
to submit reports when compared to out-group members (op- 
ponents). 


Intra-group Conflicts and Socio-political Factors 

Although we expect to find evidence of in-group favoritism 
and out-group hostility, the team competition setting of LoL 
might also lead to substantial intra-group conflict. Intuitively, 
it is easy to blame overall team performance on a single 
poorly performing player. Unlike the aforementioned concept 
of in-group favoritism, the poorly performing player does not 
affect both teams equally because he makes the opponent 
team relatively stronger. Thus, he becomes a target to blame. 


According to the classification scheme of human society in- 
troduced by Tönnies [49], teams in LoL are closer to task- 
oriented associations (Gesellschaft) than the social commu- 
nity associations (Gemeinschaft). In task-oriented associa- 
tions, the relationship among players is somewhat impersonal 
and social bonding does not necessarily exist. Thus, toxic 
players might not feel a sense of a team and feel no qualms 
about harassing teammates who are hurdles to winning rather 
than recognizing enemies for beating his team. 


While it is difficult to make direct conclusions about intra- 
group conflict, LoL players are generally segregated by what 
amounts to socio-political regions. The large world-wide 
user-base of LoL makes it feasible to study regional differ- 
ences of such intra-group conflicts. Thus, leading to our next 
research question. 


RQ3: What is the impact of socio-political factors on toxic 
behavior? 


There are several studies to support that the degree to which 
bullying occurs is influenced by socio-political factors [8, 14]. 
Chee reports on a unique Korean gaming culture called Wang- 
tta [14]. In the context of gaming, Wang-tta describes the phe- 
nomenon of “isolating and bullying the worst game player in 
one’s peer group.” Wang-tta is thought to be modeled after 
the similar Japanese term Jjime [8] which describes the com- 
fort members of collectivist societies feel from similarity and 
the abuse thrust upon those that are different. 


Conversely, other socio-political regions such as North Amer- 
ica and Western Europe tend to be individualistic, with a fo- 
cus on relying on ones’ self [27]. In such socio-political 
regions, while bullying of poorly performing players cer- 
tainly occurs, there is less ingrained hostility towards another 
player’s poor individual performance [39]. For example, con- 
sider the counterpart to the cliché “there’s no ‘T in team,” 
“well there ain’t no ‘WE’ either.’ There is simply more fo- 
cus on my performance as opposed to our performance. If 
there are in fact socio-political factors at play, then we would 
expect to see this reflected in reports on toxic behavior from 
different regions, leading to a testable hypothesis. 


HYPOTHESIS 3.1. Due to a more group-success oriented 
socio-political environment, cyberbullying offenses are less 
likely to be punished in Korea than in other regions. 


Collectivist societies such as Korea and Japan have a desire 
towards similarity, while differences are a source of dispar- 
agement [39]. In addition, collectivism, by definition, places 
more emphasis on cooperation, the group goal, and a sense of 
belonging than individualism does [18, 50]. That is, deliber- 


ately harming the group is anathema to the highly held social 
value of cooperation and is met with intense derision. Thus, 
we propose the next two hypotheses from the perspective of 
reporters and reviewers. 


HYPOTHESIS 3.2. Reports on toxic behavior that largely 
affects the result of the match are more often submitted in 
Korea than in other regions. 


HYPOTHESIS 3.3. Reports on behavior that largely af- 
fects the result of the match are more likely to be punished 
in Korea than in other regions. 


Team-cohesion and Performance 

The cohesion-performance relationship in sports has been 
studied for decades. Several studies have confirmed a moder- 
ate but significant effect of cohesion on performance, with the 
effect varying according to the type of sport, the mechanism 
of team building, and the gender of players [13, 38]. More 
generally, Felps et al. discuss how a single negative member 
can bring group-level dysfunction [25]. The negative member 
violates interpersonal social norms and thus can lead to neg- 
ative emotions and reduced trust among teammates. These 
psychological states then trigger defensive reactions and in- 
fluence the overall functioning of the group. Intuitively, toxic 
behavior is likely to have a negative effect on team-cohesion, 
and thus performance, leading to our next research question. 


RQ4: What is the relationship between toxic behavior, player 
reports, and team performance? 


Weiner proposes attribution theory which states that individ- 
uals are likely to search for causal factors of failure, consid- 
ering even innocuous factors as significant [55]. Naquin and 
Tynan present a similar concept, the team halo effect [40], 
which describes the tendency of people to give credit for suc- 
cess to the team as a collective, but to blame other people for 
poor team performance. 


The underlying cognitive process is known as counterfactual 
thinking [44], which is when we create a mental simulation 
built on events that are contrary to the facts of what really hap- 
pened. If an individual deduces a high likelihood of changing 
the outcome to a more positive one via the mental simulation, 
then the difference between simulation and reality is identi- 
fied as the causal factor for the negative outcome. 


Generally speaking, counterfactual thinking helps us accu- 
rately identify causal factors of outcomes, but it can be bi- 
ased by numerous factors [23], such as the halo effect. For 
example, if one player makes a poor decision, then a toxic 
player might imagine what would have happened if that deci- 
sion was not made, attributing the current state of the match 
to that singular incident regardless of any mistakes he or other 
players might have made. Similarly, after the match is com- 
pleted, a negative outcome and counterfactual thinking might 
lead bullied players to seek revenge on toxic players as a de- 
fensive reaction to perceived victimization and a method by 
which to gain some emotional satisfaction [20]. 


Since theory indicates negative outcomes trigger both toxic 
behavior and attempts to punish said toxic behavior, but re- 


viewers of Tribunal cases are unbiased, we propose the fol- 
lowing hypotheses. 


HYPOTHESIS 4.1. More reports come from matches 
where the accused was on the losing team. 


Also, such aggressive reporting might be likely to be par- 
doned. 


HYPOTHESIS 4.2. There are more cases pardoned when 
the accused was on the losing team than on the winning team. 


RESULTS 


Bystander Effect and Vague Nature of Toxic Playing 

First, we look into how actively people report toxic playing. 
When toxic play occurs in LoL, either 4 players (allies of the 
toxic player) or 9 players (all players but the toxic player) are 
exposed. For instance, if a toxic player is verbally abusive or 
uses offensive language in the ally chat mode, it is only visible 
to his allies. However, he can also use the all chat mode, 
exposing all 9 players to toxicity. For the other categories, all 
players are exposed in essentially the same manner. 
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Figure 2. Distribution of the number of reports per match. 


Figure 2 plots the distribution of reports per match depend- 
ing on the type of behavior reported for each region. Across 
all regions, the mean and median number of toxic reports per 
match are 1.812 and 1, respectively’. I.e., no more than 2 
players per match report toxic playing on average. Over- 
all, intentional feeding matches have the highest number of 
reports—about 50% of cases have more than 1 report, with 
over 60% for KR and EUW in particular—but the majority 
of matches have less than 3 reports. This is extremely low 
compared to the number of exposed players. I.e., on average, 
LoL players do not actively report toxic players. 


To test /H1.1] If there is a request asking to report toxic play- 
ers, the number of reports is increased, we look for situations 
where a request to report is sent to players on the enemy team; 





Every match in our dataset has at least one player report, as only 
players that have been reported appear in the Tribunal. 


i.e., those that are not negatively affected by the toxic player, 
yet are likely to recognize his behavior and react if a plea is 
made. An explicit request sent to the opposite team when 
intentionally feeding or assisting the enemy satisfies this con- 
dition. 


To test this, we define two dichotomous variables: 1) whether 
an explicit request exists, and 2) whether members of the op- 
posite team report intentional feeding or assisting the enemy. 
We set the former variable to 1 when the word “report” ap- 
pears in all chat, and 0 otherwise. We then create a 2x2 ma- 
trix of these variables. A Chi-Square test with Yates’ con- 
tinuity correction reveals that the percentage of reporting by 
enemies significantly differs with the existence of explicit re- 
quests (x?(1, N = 580,480) = 194552.9, p < .0001). 


Surprisingly, through the odds ratio we discover that the prob- 
ability of opponents reporting is 16.37 times higher when al- 
lies request a report. This supports H1.1: explicit requests to 
report toxic players highly encourage enemies to report toxic 
players even if toxic behavior is beneficial to the enemies. In 
other words, we find the bystander effect is neutralized via 
explicit requests for help. This is interesting because oppos- 
ing players can benefit from toxic playing, and their typical 
behavior of not reporting is changed due to explicit requests. 
This finding suggests that interaction design should actively 
encourage players to report others’ toxic playing. 


We next explore the possible impact of the vague nature of 
toxic playing on reporting and reviewing. First, we look into 
an association between the recognizability of toxic playing 
and the number of reports. Among the 7 types of toxic play- 
ing in LoL, intentional feeding and assisting the enemy team 
are, generally speaking, much more concrete expressions of 
toxicity than other types. The average number of reports per 
match for these two categories is higher than that for other 
types, as shown in Table 2. Both categories are consistently 
the first and the second ranked in terms of the average num- 
ber of reports per match across all regions. A Kruskal Wallis 
test revealed a significant effect for category of toxic behavior 
on the number of reports per match (v7(6) = 117,399.1, p < 
.0001). Further post-hoc statistical testing using the pairwise 
Wilcoxon test with Bonferroni correction showed a signifi- 
cant difference between categories (p < .0001). 


A.E IN LF N.A O.L S V.A 
1.876 1.602 2.091 1.696 1.691 1.769 1.740 


Table 2. Avg. number of reports per match for each type of toxic playing. 





Second, we look at how many reported toxic players are par- 
doned in Figure 3. Since players come to the Tribunal only 
after a few hundred reports against them, we do not expect 
to see a 50/50 punish/pardon ratio. We obtained records for 
477,383 punishments (80.9%) and 112,930 pardons (19.1%) 
for NA, 559,449 (86.0%) and 90,966 (14.0%) for EUW, and 
187,253 (85.0%) and 32,929 (15.0%) for KR across all cate- 
gories of toxic playing. The highest pardoned ratio we found 
for a specific category*, spamming, is 26.1% in KR. Being 
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Figure 3. Proportion of decisions according to categories of toxic play- 
ing. 


reported means that players regard it as toxic playing, and a 
pardon means that reviewers do not regard it as toxic. There- 
fore, a 26% pardoned rate shows that different perceptions 
exist: for 1 in 4 cases, reviewers do not find the player toxic. 
This high pardoned ratio confirms the impact of the vague 
nature of toxic playing in reviewing as well as in reporting. 


In-group Favoritism and Out-group Hostility 

Our next research question is understanding the difference be- 
tween reporting behavior of the toxic player’s teammates and 
his opponents. To test [H2.1] For toxic behavior that affects 
both teams equally, in-group members (teammates) are less 
likely to submit reports when compared to out-group members 
(opponents), we carefully revisit the definition of in-group 
favoritism. The key concept of in-group favoritism is “simi- 
larly” likable or unlikeable members from the same group are 
more favorable. We find this kind of relatively “neutral” toxic 
behavior from reports of inappropriate name. The inappropri- 
ate name of a toxic player is visible to all players equally, and 
thus the impact of the inappropriate name is neutral to both 
teams. 
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Figure 4. Number of matches that report each category of toxic behavior. 


In Figure 4 we plot the number of matches reported for each 
category of toxic behavior per region. As we hypothesized, 
the only category in which the number of reports from en- 
emies is higher than from allies is inappropriate name (ally 


only: 16,339, enemy only: 23,966, across regions). This in- 
dicates that allies are more forgiving, and thus less likely to 
report, than opponents in cases where the toxic player’s of- 
fense is neutral to both teams. This is exactly what in-group 
favoritism describes, and thus H2.1 is supported. 


Intra-group Conflicts and Socio-political Factors 

We now explore our next research question on the relation- 
ship between intra-group conflicts and socio-political factors. 
Figure 4 shows the number of matches that are reported due 
to each category of toxic behavior. The horizontal bar for 
each category divided into three parts shows the number of 
matches reported by ally-only, enemy-only, and both, respec- 
tively from left to right. For instance, about 200 thousand 
matches are reported by ally-only due to assisting enemy, 
12,106 matches by enemy-only, and 44,927 matches by both 
in EUW. We find that most reports come from allies rather 
from enemies as evidences of intra-group conflicts. 


In Figure 4, offensive language is the most reported toxic be- 
havior in LoL and is highly reported by allies. The chat fea- 
ture is designed for exchanging strategies and sharing emo- 
tions, and thus serves to foster a sense of belonging in the 
team. In practice however, it becomes a channel for toxic 
players to harass other ally players [32]. 


We additionally note that Riot Games disabled the all chat 
by default in newly installed clients since April 2012, while 
the Tribunal began in May 2011. Although players can easily 
turn the all chat option on in the game configuration with a 
single click, we expect that it decreases the verbal interaction 
across teams after April 2012. 


To confirm that this is not the reason that players are likely 
to report allies rather than opponents for verbal abuse or of- 
fensive language, we look into the temporal trend of the toxic 
reports for those two categories. We divide toxic reports by a 
fixed-size time window, defined as 1,000 consecutive numeric 
identifiers of Tribunal cases. We compute the proportion of 
toxic reports that come from allies vs. those from opponents 
in each time window and find that, although it varies over 
time, the number of reports by allies is consistently higher 
than those by opponents. This shows that frequent intra-group 
conflicts through chat are not artificial effects of the user in- 
terface. 


To test [H3.1] Due to a more group-success oriented socio- 
political environment, cyberbullying offenses are less likely 
to be punished in Korea than in other regions, we examine 
the rate of pardons for cyberbullying offenses in the different 
regions. 


Figure 5 plots the percentage of cyberbullying reports (offen- 
sive language and verbal abuse) that result in pardons for each 
region. 17.1% of such reports are pardoned in KR, compared 
to 14.3% in NA and 9.7% in EUW. A Chi-Square test with 
Yates continuity correction reveals the effect of region on par- 
dons in those categories is significant (y7(2, N = 3,108,172) 
= 24123.3, p < .0001). Thus H3.1 is supported. 


As we previously noted, a likely explanation for this is due 
to the Wang-tta concept in KR. Particularly invasive in gam- 
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Figure 5. The percentage of cyberbullying reports that result in pardons 
for each region. 


ing culture, Wang-tta probably leads to reviewers empathiz- 
ing not with the cyberbullying victim, but rather the alleged 
toxic player who verbalized his displeasure with the victim’s 
performance. 


To test /H3.2] Reports on toxic behavior that largely affects 
the result of the match are more often submitted in Korea than 
in other regions, we compare the percentage of reports for in- 
tentional feeding or assisting enemy across regions since such 
toxic behavior directly contradicts the group-success goal. 


We find support for H3.2 by looking at the mean and median 
of reports for assisting enemy or intentional feeding coming 
from teammates (1.482 and 1, 1.714 and 2, and 1.75 and 2 for 
NA, EUW, and KR, respectively). A Kruskal-Wallis test re- 
veals the effect of region on reports coming from teammates 
is significant (x?(2) = 31175.43, p < .0001). A post-hoc test 
using Mann-Whitney tests with Bonferroni correction con- 
firms the significant differences between NA and EUW (p < 
.0001, r = .129), between NA and KR (p < .0001, r = .148), 
and between EUW and KR (p < .0001, r = .02). 
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Figure 6. The percentage of reports for behavior that directly affects the 
result of the match for each region that results in a punishment. 


We move on to testing /H3.3] Reports on behavior that 
largely affects the result of the match are more likely to be 
punished in Korea than in other regions. We plot the percent- 
age of punish decisions for behavior that directly affects the 
result of the match for each region in Figure 6. While such 
offenses are heavily punished in each region (over 80%), we 


do find a difference. Korean reviewers are more likely to per- 
ceive assisting enemy and intentional feeding as severe toxic 
playing with much higher levels of agreement than other re- 
gions (for Punish, Overwhelming Majority, KR: 48%, NA: 
27%, EUW: 24%), which is affirmative support for H3.3. 
A Chi-Square test with Yates continuity correction reveals 
the effect of region on the percentage of Punish, Overwh- 
leming Majority in those categories is significant (x?(2, N 
= 1,955,297) = 83593.08, p < .0001). 


Team-cohesion and Performance 
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Figure 7. The winning ratio for each category of toxic behavior with 
95% confidence interval. 


To test /H4.1] More reports come from matches where the ac- 
cused was on the losing team, we plot the winning ratio with 
95% confidence interval for the different categories of toxic 
play reported in Figure 7. From the figure, it is immediately 
apparent that winning ratio is clearly below than 50%, even 
though LoL uses a match making system similar to Elo rank- 
ing [24] that attempts to match players in a manner where 
there is a 50% chance of winning for each team. In particu- 
lar, we see that the winning ratio for intentional feeding and 
assisting the enemy are extremely low (under 15% for both). 
Therefore, more reports come from losing teams so that the 
observed winning ratio is less than 0.5 and H4.1 is supported. 


As mentioned in the research questions section, lower-team 
cohesion leads to lower performance. Alternatively, a poor 
performance might trigger toxic behavior like cyberbullying 
and also might be a cause for the high number of reports in- 
volving a losing team. Cyberbullying offenses are explained 
by attribution theory in that when a toxic player recognizes a 
poor performance (e.g., even though the match is not decided, 
his team is losing) he looks for someone other than himself 
to place blame. Related to this, for example in the spamming 
category, players that were on the losing side of the match 
might attempt to attribute the loss to a another member of the 
team and attempt to punish him via the reporting system. 

To test /H4.2] There are more cases pardoned when the ac- 
cused was on the losing team than on the winning team, we 
plot the pardoned ratio when the accused toxic player was 
on the winning or losing side. Figure 8 plots the breakdown 
of pardon and punish decisions as a function of whether the 
accused toxic player was on the winning or losing side. If 
people regard somewhat innocuous factors as significant rea- 
sons of defeat and report it as toxic behavior, the pardoned 
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Figure 8. Pardoned ratio when losing vs. winning. 


ratio for defeats will be higher than that for wins. As seen in 
Figure 8 however, we find that the proportion of being par- 
doned by crowds when losing is less than that when winning. 
Thus, H4.2 is not supported, and in fact the opposite is the 
case. 


DISCUSSION AND CONCLUSION 

In this work we explored toxic playing and the reaction to 
it via crowdsourced decisions using a few million observed 
reports. Our large-scale dataset enables an opportunity to 
explore several compelling theories from sociology and psy- 
chology that discuss the “whys” of toxic behavior. 


We first showed that there is relatively low participation 
in reporting toxic behavior and reconfirmed the impact of 
anonymity in CMC and cyberspace. This finding underlines 
the difficulties in relying solely on voluntary player reports. 
For example, on Facebook there is a button to report a com- 
ment that violates cyber-etiquette. The low degree of partici- 
pation for social control we found raises a fundamental ques- 
tion about the design considerations of such report-based sys- 
tems: if relatively few “victims” voluntarily report such be- 
havior, how effective can it truly be? Interestingly, our finding 
that an explicit request to report toxic behavior significantly 
increases the likelihood of reporting indicates that actively 
encouraging reporting should be considered in the design of 
systems to address toxic behavior. 


Next, we examined the vague nature of toxic behavior. This 
issue is repeatedly observed in online games [34]. With over 
10% of cases being pardoned in the Tribunal, it is clear that 
crowdsourcing using experienced players is useful in pro- 
tecting innocent victims who are wrongly reported by other 
players due to poor gaming skills or aggressive (but not 
toxic) linguistic behavior. The Tribunal could further im- 
prove the quality and efficiency of crowdsourced workers via 
several proposed mechanisms for quality control in crowd- 
sourced systems and augmentation with machine-learning so- 
lutions [11]. 


We then moved on to how group setting might influence re- 
porting behavior. We quantitatively show that in-group fa- 
voritism and out-group hostility increase or decrease the will- 
ingness to report. This is rooted in competition among play- 
ers. Since competition is a common game design element 


for enjoyment [52], our findings are applicable to most on- 
line games. Even though a given game might not have a 
form of team competition, sense of belonging is frequently 
derived from a small group of players, called guilds or parties 
in MMORPGs [22]. In a broad sense, homophily can appear 
not only in an explicit group but also in an implicit group [37]. 
The current work portends possible bias of observed reports 
in such settings. 


Most online games allow interactions between players. In this 
environment toxic playing is a serious issue that degrades user 
experience. Our work offers understanding of toxic playing 
and its victims based on LoL data, but much of the mechan- 
ics involved are typical game elements not unique to LoL. We 
also believe that with the growing trend of gamification (e.g., 
as applied to citizen science) our findings have broader appli- 
cation than traditional entertainment gaming. In fact, some 
gamified citizen science projects have already experience low 
level toxicity [7]. As designers and scientists import more and 
more gaming elements into their systems, they will most as- 
suredly be accompanied by less desirable elements of gaming 
culture, such as toxic behavior. 


We also see similarities between LoL and online communi- 
ties. Online communities, e.g., Reddit, allow anonymous user 
identities. They use unique user names which are not linked 
to real identity. Although there are differences between forms 
of toxic playing and cyberbullying in online communities, the 
disconnect between real and virtual world is a common root 
that both cyberbullying and toxic playing stem from. 


Beyond gaming, our work deepens understanding of team 
conflicts in general. This has great potential because teams 
are an essential building block of modern organizations. 
Also, in the current globally distributed workspace, collab- 
oration through an electronic channel is pervasive. This often 
results in a goal-oriented group that lacks social connections. 
Such settings can accelerate the tendency to blame others, fur- 
ther escalated by the individuals’ unfamiliarity with remote 
partners [19]. Competitive games in general, and the LoL Tri- 
bunal in particular, are thus a valuable asset to capture group 
conflicts in the virtual space and test effective treatments for 
them. We believe that solutions for toxic play in team com- 
petition online games could have huge impact for real-world 
scenarios, and not just virtual spaces. 


For the overall CHI community, while ethnography is tradi- 
tionally considered as one of the best methods to understand 
people, we show that studying human behavior with big data 
and testable hypotheses works well. We hope that this helps 
accelerate the discovery of big data’s value by the CHI com- 
munity. 


Caveats and Limitations 

Although our findings confirm several theories on cyberbul- 
lying and toxic behavior, we note that there are limitations 
and caveats that must be considered. First, although there 
is reason to believe that gamers generally behave the same 
in-game as they do in the real-world [12], the fact remains 
that our dataset is drawn from a game. At minimum, this 
means applying our results to other domains must be done 


carefully [30]. Further, our findings are from LoL in particu- 
lar, and there are domain specific concerns that might not be 
present in other games even within the same genre. For exam- 
ple, SMITE is currently the 3rd most popular MOBA in the 
market, but completely lacks an all-talk chat mode. Addition- 
ally, SMITE and Dota 2 (the 2nd most popular MOBA) both 
have integrated voice communication, which might intensify 
or soften cyberbullying with feelings of social presence. 


Next, the social network structure among players, “friend” re- 
lationships, is not considered in this work due to lack of data. 
We thus do not incorporate social relationship among specific 
players in building our hypotheses. As recent studies reveal 
that playing together with friends influences performance and 
toxicity [36, 45], we believe that more detailed data including 
players’ social networks would enable testing of more sophis- 
ticated hypotheses. 


Finally, although our dataset is large-scale and quite rich, it 
is anonymized and thus prevents us from knowing how toxic 
players behaved before and after the matches were aggregated 
into their case and a decision was made. We also lack knowl- 
edge of how other players in the game typically behave. Thus, 
while we were able to provide significant support for several 
theories, there are questions for which answers continue to 
elude us. 
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